An Optimization Based Framework for Dynamic Batch Mode Active Learning
نویسندگان
چکیده
Active learning techniques have gained popularity in reducing human effort to annotate data instances for inducing a classifier. When faced with large quantities of unlabeled data, such algorithms automatically select the salient and representative samples for manual annotation. Batch mode active learning schemes have been recently proposed to select a batch of data instances simultaneously, rather than updating the classifier after every single query. While numerical optimization strategies seem a natural choice to address this problem (by selecting a batch of points to ensure that a given objective criterion is optimized), many of the proposed approaches are based on greedy heuristics. Also, all the existing work on batch mode active learning assume that the batch size is given as an input to the problem. In this work, we propose a novel optimization based strategy to dynamically decide the batch size as well as the specific points to be queried, based on the particular data stream in question. Our results on the widely used VidTIMIT and the MBGC biometric datasets corroborate the efficacy of the framework to adaptively identify the batch size and the particular data points to be selected for manual annotation, in any batch mode active learning application.
منابع مشابه
Active Instance Sampling via Matrix Partition
Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be f...
متن کاملClassification Active Learning Based on Mutual Information
Selecting a subset of samples to label from a large pool of unlabeled data points, such that a sufficiently accurate classifier is obtained using a reasonably small training set is a challenging, yet critical problem. Challenging, since solving this problem includes cumbersome combinatorial computations, and critical, due to the fact that labeling is an expensive and time-consuming task, hence ...
متن کاملNear-optimal Batch Mode Active Learning and Adaptive Submodular Optimization
Active learning can lead to a dramatic reduction in labeling effort. However, in many practical implementations (such as crowdsourcing, surveys, high-throughput experimental design), it is preferable to query labels for batches of examples to be labelled in parallel. While several heuristics have been proposed for batch-mode active learning, little is known about their theoretical performance. ...
متن کاملJoint Transfer and Batch-mode Active Learning
Active learning and transfer learning are two different methodologies that address the common problem of insufficient labels. Transfer learning addresses this problem by using the knowledge gained from a related and already labeled data source, whereas active learning focuses on selecting a small set of informative samples for manual annotation. Recently, there has been much interest in develop...
متن کاملDiscriminative Batch Mode Active Learning
Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at one time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative un...
متن کامل